Skip to content

[AMORO-4235] Handling stale ack gracefully when task is reset by OptimizerKeeper#4239

Open
zhangwl9 wants to merge 1 commit into
apache:masterfrom
zhangwl9:AMORO-fixup-task-reset-bug-dev
Open

[AMORO-4235] Handling stale ack gracefully when task is reset by OptimizerKeeper#4239
zhangwl9 wants to merge 1 commit into
apache:masterfrom
zhangwl9:AMORO-fixup-task-reset-bug-dev

Conversation

@zhangwl9

@zhangwl9 zhangwl9 commented May 29, 2026

Copy link
Copy Markdown
Contributor

Why are the changes needed?

Close #4235.

Brief change log

When OptimizerKeeper detects an optimizer is expired and resets tasks by clearing
the task token, the Optimizer may still poll and attempt to ack those stale tasks.
Previously this caused ERROR logs and unexpected behavior.

This fix adds exception detection in OptimizerExecutor.ackTask() to recognize the
"Task has been reset" exception. When detected:

  • Log level is reduced from ERROR to WARN
  • Return false to skip task execution (as expected)
  • The task will be re-polled by another optimizer and re-executed**

How was this patch tested?

  • Add some test cases that check the changes thoroughly including negative and positive cases if possible

  • Add screenshots for manual tests if appropriate

  • Run test locally before making a pull request

Documentation

  • Does this pull request introduce a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

@github-actions github-actions Bot added the module:ams-server Ams server module label May 29, 2026
@codecov-commenter

codecov-commenter commented May 29, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 57.14286% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 30.14%. Comparing base (99fcc08) to head (9f09d75).
⚠️ Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
...rg/apache/amoro/server/optimizing/TaskRuntime.java 57.14% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master    #4239      +/-   ##
============================================
+ Coverage     23.09%   30.14%   +7.04%     
- Complexity     2706     4382    +1676     
============================================
  Files           463      680     +217     
  Lines         42826    55272   +12446     
  Branches       6044     7092    +1048     
============================================
+ Hits           9891    16662    +6771     
- Misses        32076    37352    +5276     
- Partials        859     1258     +399     
Flag Coverage Δ
core 30.14% <57.14%> (?)
trino ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@zhangwl9 zhangwl9 changed the title [AMORO-4235] Fix the issue of optimizing tasks get stuck when stale ack arrives after task reset by OptimizerKeeper [AMORO-4235] Handling stale ack gracefully when task is reset by OptimizerKeeper Jun 8, 2026
@zhangwl9 zhangwl9 force-pushed the AMORO-fixup-task-reset-bug-dev branch from 9f09d75 to 7721dc0 Compare June 8, 2026 08:00
@github-actions github-actions Bot added module:ams-optimizer AMS optimizer module and removed module:ams-server Ams server module labels Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module:ams-optimizer AMS optimizer module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: The task submission failed, with no retries after the timeout failure, and it remained blocked: Task has been reset or not yet scheduled

2 participants